In-context phone posteriors as complementary features for tandem ASR
نویسندگان
چکیده
In this paper, we present a method for integrating possible prior knowledge (such as phonetic and lexical knowledge), as well as acoustic context (e.g., the whole utterance) in the phone posterior estimation, and we propose to use the obtained posteriors as complementary posterior features in Tandem ASR configuration. These posteriors are estimated based on HMM state posterior probability definition (typically used in standard HMMs training). In this way, by integrating the appropriate prior knowledge and context, we enhance the estimation of phone posteriors. These new posteriors are called ‘in-context’ or HMM posteriors. We combine these posteriors as complementary evidences with the posteriors estimated from a Multi Layer Perceptron (MLP), and use the combined evidence as features for training and inference in Tandem configuration. This approach has improved the performance, as compared to using only MLP estimated posteriors as features in Tandem, on OGI Numbers , Conversational Telephone speech (CTS), and Wall Street Journal (WSJ) databases.
منابع مشابه
Context-dependent phone mapping for LVCSR of under-resourced languages
This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. like...
متن کاملMultilingual speech recognition A posterior based approach
Modern automatic speech recognition (ASR) systems are based on parametric statistical models such as hidden Markov models (HMMs), exploiting 1) acoustic-phonetic models, which need to be trained on large amount of acoustic data, 2) a language model, which needs to be trained on large amount of text data and, finally, 3) a lexicon with phonetic transcription which requires linguistic expertise. ...
متن کاملDeveloping and enhancing posterior based speech recognition systems
Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “Tandem”) to improve speech recognition systems. In this paper, we present initial results towards boosting these approaches by improving posterior estimates, using acoustic context (e.g., as available in the whole utterance), as well as p...
متن کاملAcoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features
Most previous studies on acoustic assessment of disordered voice were focused on extracting perturbation features from isolated vowels produced with steady-state phonation. Natural speech, however, is considered to be more preferable in the aspects of flexibility, effectiveness and reliability for clinical practice. This paper presents an investigation on applying automatic speech recognition (...
متن کاملConfidence Measures for Tandem Connectionist Feature Extraction
This report proposes and compares a number of tandem-like feature extraction schemes. The proposed schemes use relative phone posteriors as confidence measures estimated from the MLP outputs directly or using Gamma function. The analysis of variances shows that the proposed tandem-like features discriminate better between phone classes than the conventional tandem features. But these capabiliti...
متن کامل